fix(esc): worker-thread iteration for OpenAI-compat streaming (LiteLLM hang fix) by ericleepi314 · Pull Request #148 · agentforce314/clawcodex

ericleepi314 · 2026-05-16T03:56:05Z

Summary

User on LiteLLM-proxied Claude Opus 4.7 reported ESC still hangs indefinitely despite PRs fix(esc): cancel streaming API call when abort signal trips #144/fix(esc): mid-stream cancellation for OpenAI-compatible + Minimax providers #145/refactor(providers): extract StreamAbortGuard helper for streaming cancel #146. Diagnostic prints confirmed the abort signal was tripped and the listener fired, but the OpenAI SDK's blocking socket read kept draining bytes — response.close() from another thread is purely advisory in this configuration.
Fix: run the SDK iteration on a daemon worker thread that pushes chunks into a queue.Queue. Main thread polls with 100 ms timeout, checks abort each tick, raises AbortError immediately and orphans the worker.
After this PR, the user manually verified ESC unwinds within ~100 ms on their LiteLLM stack.

Why this is needed

The TypeScript reference at typescript/src/services/api/openaiShim.ts works because JavaScript's native fetch + AbortSignal integration — reader.read() rejects with AbortError when the signal trips. Python's sync httpx has no equivalent. Two earlier attempts confirmed the diagnosis:

signal.pthread_kill(main_thread, SIGINT) DID unblock the iteration but propagated KeyboardInterrupt uncaught past every layer of exception handling, exiting the entire python -m src.cli process. SIGINT lands at the next bytecode boundary, rarely inside our narrow except KeyboardInterrupt window.
response.close() from another thread is best-effort across SDK + proxy + httpx version combinations. For LiteLLM-proxied chunked-transfer streams, the kernel-level socket read keeps receiving data even after the client-side response object is "closed."

The worker+queue pattern is the only portable design that makes the main thread's response time independent of the SDK's cooperation.

Changes

src/providers/openai_compatible.py:
- SDK iteration moved to a daemon worker thread
- Main thread polls queue.Queue with 100ms timeout, checks guard.aborted between ticks
- Worker catches BaseException so KeyboardInterrupt/SystemExit are routed back to main for re-raise
- Abort sites use guard.raise_if_post_aborted() instead of hardcoded AbortError("user_interrupt") to preserve non-default abort reasons
- Method docstring rewritten to explain the worker indirection (preventing future "simplifications")
tests/test_openai_compat_abort_signal.py — 2 new regression tests:
- test_abort_unwinds_promptly_even_when_iterator_never_returns — uses a _StuckStream whose __iter__ blocks on a never-set threading.Event. Mutation-verified: reverting to pre-fix code made pytest hang past an 8-second gtimeout — confirming this test catches the LiteLLM regression.
- test_normal_completion_still_captures_final_usage — pins the drain invariant. OpenAI emits usage stats in the final empty-choices chunk; the main thread must process every queued chunk before breaking on _DONE, otherwise ↓ N tokens REPL spinner would silently lose count.

Test plan

2 new regression tests pass; stuck-stream test is mutation-verified
34 abort-related tests pass (8 OpenAI-compat + 17 StreamAbortGuard + 5 Anthropic + 4 Minimax)
4438 broader related tests pass
Critic subagent review: APPROVE (after applying docstring + abort-reason cleanups)
User manually verified on their LiteLLM → Claude Opus 4.7 stack — ESC now works promptly

Trade-offs

Wasted bandwidth on orphaned reads: the worker keeps consuming bytes until the upstream connection closes naturally. Practical impact: at most one cancelled stream of bandwidth per ESC press, reclaimed when the upstream finishes. Memory: queue.Queue is unbounded, so a worst-case "stuck iterator AND chunks flowing" path could accumulate megabytes. Flagged as a follow-up; not blocking on current evidence (the user's repro has the iterator stuck, not flowing).
100 ms response cadence: ESC takes at most one tick to unwind. Imperceptible to users.

Out of scope (explicit user request)

Not extending the worker+queue to AnthropicProvider or MinimaxProvider. Both use the anthropic SDK and work reliably against direct connections; no user has reported proxy-buffering issues against them. The same pattern ports cleanly if needed in the future.

🤖 Generated with Claude Code

PRs #145 / #146 added mid-stream ESC cancellation for OpenAI-compatible providers via a ``stream.response.close()`` listener. That works for direct connections, but a user on LiteLLM-proxied Claude Opus 4.7 reported it still hangs indefinitely — ``response.close()`` from the listener thread doesn't actually terminate the SDK's blocking socket read in that configuration, so the iterator keeps draining bytes and the agent loop's cancel boundary never sees the abort. Two earlier attempts confirmed the diagnosis: * Adding ``signal.pthread_kill(main_thread, SIGINT)`` to the listener DID unblock the iteration but propagated ``KeyboardInterrupt`` uncaught past every layer of exception handling, exiting the entire ``python -m src.cli`` process. The signal hits the main thread at the next bytecode boundary, which is rarely inside our narrow ``except KeyboardInterrupt`` window. * The TypeScript reference at ``typescript/src/services/api/openaiShim.ts`` works because JavaScript's native ``fetch + AbortSignal`` integration makes ``reader.read()`` reject with ``AbortError`` when the signal trips. Python's sync ``httpx`` has no equivalent. Fix: decouple the main thread's response time from the SDK's cooperation. The SDK iteration runs on a daemon worker thread that pushes chunks into a ``queue.Queue``. The main thread polls with a 100 ms timeout and re-checks ``guard.aborted`` between ticks. On abort, raise ``AbortError`` immediately and orphan the worker — it dies when the underlying connection eventually closes. The cost is some wasted bandwidth on the orphaned read; the benefit is that ESC unwinds within ~100 ms regardless of LiteLLM / httpx / chunked- transfer behavior. Specifics: * Worker pushes each chunk into the queue, then a ``_DONE`` sentinel in a ``finally`` so the main thread can break the loop on either normal exhaustion or worker exception. * Worker catches ``BaseException`` so ``KeyboardInterrupt`` / ``SystemExit`` from the worker thread are routed back to the main thread for re-raise rather than silently dropped. * Main-thread loop checks abort in the ``Queue.Empty`` branch (bounds ESC latency at one tick) AND after processing each chunk (preserves the chunk we just received, matching the in-loop-check semantics pinned by the existing chunk-list regression test). * Abort sites call ``guard.raise_if_post_aborted()`` instead of hardcoding ``AbortError("user_interrupt")`` so a non-default abort reason on the signal (future ``"rate_limit_backoff"`` etc.) is preserved. * Updated method docstring explains the worker indirection and the Python-vs-JS-fetch contrast so the next engineer touching this doesn't "simplify" the queue away. Two new regression tests: * ``test_abort_unwinds_promptly_even_when_iterator_never_returns`` uses a ``_StuckStream`` whose ``__iter__`` blocks on a never-set Event. Without the worker+queue, the main thread would block on ``next(stream)`` forever; mutation-verified by reverting to the pre-fix code and watching pytest hang past an 8-second gtimeout. * ``test_normal_completion_still_captures_final_usage`` pins the drain invariant: the main thread must process the final empty-choices/populated-usage chunk before breaking on ``_DONE``, otherwise OpenAI's ``stream_options.include_usage=True`` token counts would silently regress. Anthropic and Minimax providers are unchanged. They use the ``anthropic`` SDK whose ``client.messages.stream(...)`` works reliably against direct connections, and no user has reported the LiteLLM-style proxy buffering against them. If that changes, the same worker+queue pattern ports cleanly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

ericleepi314 merged commit adfb2d4 into main May 16, 2026

ericleepi314 mentioned this pull request May 16, 2026

docs(readme): refresh stats + news + Core Systems for image/ESC work #157

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(esc): worker-thread iteration for OpenAI-compat streaming (LiteLLM hang fix)#148

fix(esc): worker-thread iteration for OpenAI-compat streaming (LiteLLM hang fix)#148
ericleepi314 merged 1 commit into
mainfrom
fix/esc-cancel-openai-compat-worker-queue

ericleepi314 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ericleepi314 commented May 16, 2026

Summary

Why this is needed

Changes

Test plan

Trade-offs

Out of scope (explicit user request)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant